Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't type check most function bodies if ignoring errors #14150

Merged
merged 26 commits into from
Apr 24, 2023
Merged

Conversation

JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Nov 19, 2022

If errors are ignored, type checking function bodies often can have no effect. Remove function bodies after parsing to speed up type checking.

Methods that define attributes have an externally visible effect even if errors are ignored. The body of any method that assigns to any attribute is preserved to deal with this (even if it doesn't actually define a new attribute). Most methods don't assign to an attribute, so stripping bodies is still effective for methods.

There are a couple of additional interesting things in the implementation:

  1. We need to know whether an abstract method has a trivial body (e.g. just ...) to check super() method calls. The approach here is to preserve such trivial bodies and treat them differently from no body at all.
  2. Stubgen analyzes the bodies of functions to e.g. infer some return types. As a workaround, explicitly preserve full ASTs when using stubgen.

The main benefit is faster type checking when using installed packages with inline type information (PEP 561). Errors are ignored in this case, and it's common to have a large number of third-party code to type check. For example, a self check (code under mypy/) is now about 20% faster, with a compiled mypy on Python 3.11.

Another, more subtle benefit is improved reliability. A third-party library may have some code that triggers a mypy crash or an invalid blocking error. If bodies are stripped, often the error will no longer be triggered, since the amount code to type check is much lower.

@github-actions

This comment has been minimized.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like the idea, left couple comments (not a full review).

self.lvalue = False
self.found = False

def visit_assignment_stmt(self, s: AssignmentStmt) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also need to check assignment expression? (i.e. walrus a.k.a :=)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't support assigning to an attribute.

):
# We only strip method bodies if they don't assign to an attribute, as
# this may define an attribute which has an externally visible effect.
visitor = FindAttributeAssign()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to strip statements after last assignment to attribute? This could make a bit more perf gain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be possible! I'd rather not do it in this PR to keep it simple (and the impact is probably pretty minor), but it's seems like a promising follow-up improvement to investigate.

Copy link
Collaborator

@cdce8p cdce8p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a chance to test the PR on a full run with Home Assistant (with all dependencies installed). The performance improvement is noticeable. Both with the uncompiled version on Python 3.9 - perviously: ~11min - now: ~7:30min 🚀

Unfortunately, I also got a few new error messages which the primer didn't pick up on.

homeassistant/components/nibe_heatpump/__init__.py:267: error: "Coroutine[Any, Any, AsyncIterator[Coil]]" has no attribute "__aiter__" (not async iterable)  [attr-defined]
homeassistant/components/nibe_heatpump/__init__.py:267: note: Maybe you forgot to use "await"?
homeassistant/components/google/calendar.py:350: error: "Coroutine[Any, Any, AsyncIterator[ListEventsResponse]]" has no attribute "__anext__"  [attr-defined]
homeassistant/components/google/calendar.py:350: note: Maybe you forgot to use "await"?
homeassistant/components/homekit_controller/config_flow.py:155: error: "Coroutine[Any, Any, AsyncIterable[AbstractDiscovery]]" has no attribute "__aiter__" (not async iterable)  [attr-defined]
homeassistant/components/homekit_controller/config_flow.py:155: note: Maybe you forgot to use "await"?
homeassistant/components/amcrest/__init__.py:209: error: Return type "_AsyncGeneratorContextManager[Response]" of "async_stream_command" incompatible with return type "_AsyncGeneratorContextManager[<nothing>]" in supertype "Http"  [override]
homeassistant/components/amcrest/__init__.py:214: error: Need type annotation for "ret"  [var-annotated]
Found 5 errors in 4 files (checked 5433 source files)

One of the edge cases seem to be async iterators with yield. To reproduce it, create two files

# a.py
from typing import AsyncIterator


class L:
    async def some_func(self, i: int) -> str:
        return str(i)

    async def get_iterator(self) -> AsyncIterator[str]:
        for i in range(5):
            yield await self.some_func(i)
# b.py
from a import L

async def func(l: L) -> None:
    reveal_type(l.get_iterator)
    async for i in l.get_iterator():
        print(i)

And run mypy b.py.

b.py:5: note: Revealed type is "def () -> typing.Coroutine[Any, Any, typing.AsyncIterator[builtins.str]]"
b.py:6: error: "Coroutine[Any, Any, AsyncIterator[str]]" has no attribute "__aiter__" (not async iterable)  [attr-defined]
b.py:6: note: Maybe you forgot to use "await"?

@cdce8p
Copy link
Collaborator

cdce8p commented Nov 20, 2022

The second edge case is fairly similar, just with the addition of @asynccontextmanager.

# a.py
from contextlib import asynccontextmanager
from typing import AsyncIterator

class Parent:
    @asynccontextmanager
    async def async_func(self) -> AsyncIterator[str]:
        yield ''
# b.py
from contextlib import asynccontextmanager
from typing import AsyncIterator

from test8 import Parent

class Child(Parent):
    @asynccontextmanager
    async def async_func(self) -> AsyncIterator[str]:
        yield ''

Running mypy b.py

b.py:9: error: Return type "_AsyncGeneratorContextManager[str]" of "async_func" incompatible with return type "_AsyncGeneratorContextManager[<nothing>]" in supertype "Parent"  [override]

@JukkaL
Copy link
Collaborator Author

JukkaL commented Nov 21, 2022

@cdce8p Thanks for the detailed reports about regressions! I clearly need to investigate those cases.

These can have an externally visible effect within an async function.
The existence of yield affects the inferred return type, and a yield
from generates a blocking error.
@github-actions

This comment has been minimized.

@cdce8p
Copy link
Collaborator

cdce8p commented Dec 4, 2022

@cdce8p Thanks for the detailed reports about regressions! I clearly need to investigate those cases.

Just tested the PR with Home Assistant again. The last changes resolve all issues 🎉

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 8, 2022

There are some performance regressions on master that I'd like to address first before merging this. Otherwise the performance impact estimate might be inflated.

@hauntsaninja
Copy link
Collaborator

@JukkaL I just fixed conflicts. What's the status of this PR? Are you ready to re-measure perf / merge in time for 1.0?

@github-actions

This comment has been minimized.

@hauntsaninja hauntsaninja mentioned this pull request Jan 26, 2023
17 tasks
@christianbundy
Copy link
Contributor

christianbundy commented Feb 6, 2023

Just wanted to quickly note that I think this will be very valuable for codebases that are migrating from untyped Python to typed Python. We use # mypy: ignore-errors in most test files, and excluding those files made our type-checking 300% as fast (i.e. our type-check time was reduced 66%, from 6 minutes to 2 minutes). I didn't test this PR, but I'm using this script which may have similar behavior.

# Exclude files with '# mypy: ignore-errors' for performance
# - Requires GNU Grep (tested with 3.8)
# - Ignores the same directories as Mypy https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-exclude
# - Also ignores hidden directories like `.git` and `.mypy_cache`
# - Does not support unnecessary whitespace in comment
exclude="$(\
    grep \
    --binary-files without-match \
    --recursive \
    --include '*.py'  \
    --include '*.pyi' \
    --exclude-dir 'site-packages' \
    --exclude-dir 'node_modules' \
    --exclude-dir '__pycache__' \
    --exclude-dir '.*' \
    --files-with-matches \
    '^# mypy: ignore[-_]errors$' \
    | sed 's/^\.\///g' \
    | tr '\n' '|' \
    | sed 's/\./\\\./g' \
    | sed 's/|$//'\
)"

mypy --exclude "$exclude"

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@JukkaL
Copy link
Collaborator Author

JukkaL commented Apr 24, 2023

I finally got around to running some benchmarks. Self-checking mypy (not including mypyc) was about 10% faster when compiled, and 15% faster when interpreted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants